Convex Group Clustering of Large Geo-referenced Data Sets 1 Convex Group Clustering of Large Geo-referenced Data Sets
نویسنده
چکیده
Clustering partitions a data set S = fs1;:::;sng < m into groups of nearby points. Distance-based clustering uses op-timisation criteria for deening the quality of the partition. Formulations using representatives (means or medians of groups) have received much more attention than minimisa-tion of the total within group distance (TWGD). However, this non-representative approach has attractive properties while remaining distance-based. While representative approaches produce partitions with non-overlapping clusters, TWGD does not. We investigate the restriction of TWGD to producing convex-hull disjoint groups and show that this problem is NP-complete in the Euclidean case as soon as m 2. Nevertheless we provide eecient algorithms for solving it approximately. 1 Introduction Clustering is a fundamental task in data analysis since it identiies groups in heterogeneous data. Clustering can be seen as a concept formation or class delineation problem. At least the elds of statistics 44, 46], machine intelligence 5, 15, 32] and more recently knowledge discovery and data mining (KDDM) 12, 14, 37, 47] have contributed with algorithms for many clustering approaches. Hierarchical bottom-up approaches form groups by composition or merging items that are close together 10, 29]. However, top-down partition approaches to clustering are also interesting, in particular for spatial data mining 12, 37, 48]. This perspective deenes clustering as partitioning a heterogeneous data set into smaller more homogeneous groups 2, 19, 40]. Clustering typically uses a metric (or distance) to determine the dissimilarity between the items to be clustered. Here we consider the clustering problem in the context of spatial databases, those typically associated with a Geographical Information System (GIS). In spatial settings, the clustering almost invariably makes use of some distance that captures the notion of proximity, as it reeects the essence of
منابع مشابه
Convex group clustering of large geo-referenced data sets
Clustering partitions a data set S = fs1; : : : ; sng < into groups of nearby points. Distance-based clustering methods use optimisation criteria to de ne the quality of a partition. Formulations using representatives (means or medians of groups) have received much more attention than minimisation of the total within group distance (TWGD). However, this non-representative approach has attractiv...
متن کاملAn Incremental DC Algorithm for the Minimum Sum-of-Squares Clustering
Here, an algorithm is presented for solving the minimum sum-of-squares clustering problems using their difference of convex representations. The proposed algorithm is based on an incremental approach and applies the well known DC algorithm at each iteration. The proposed algorithm is tested and compared with other clustering algorithms using large real world data sets.
متن کاملConvex structures via convex $L$-subgroups of an $L$-ordered group
In this paper, we first characterize the convex $L$-subgroup of an $L$-ordered group by means of fourkinds of cut sets of an $L$-subset. Then we consider the homomorphic preimages and the product of convex $L$-subgroups.After that, we introduce an $L$-convex structure constructed by convex $L$-subgroups.Furthermore, the notion of the degree to which an $L$-subset of an $L$-ord...
متن کاملModified Convex Data Clustering Algorithm Based on Alternating Direction Method of Multipliers
Knowing the fact that the main weakness of the most standard methods including k-means and hierarchical data clustering is their sensitivity to initialization and trapping to local minima, this paper proposes a modification of convex data clustering in which there is no need to be peculiar about how to select initial values. Due to properly converting the task of optimization to an equivalent...
متن کاملData Mining Techniques for Autonomous Exploration of Large Volumes of Geo-referenced Crime Data
We incorporate two knowledge discovery techniques, clustering and association-rule mining, into a fruitful exploratory tool for the discovery of spatio-temporal patterns. This tool is an autonomous pattern detector to reveal plausible cause-effect associations between layers of point and area data. We present two methods for this exploratory analysis and we detail algorithms to effectively expl...
متن کامل